Goto

Collaborating Authors

 robust measure



In search of robust measures of generalization

Neural Information Processing Systems

One of the principal scientific challenges in deep learning is explaining generalization, i.e., why the particular way the community now trains networks to achieve small training error also leads to small error on held-out data from the same population. It is widely appreciated that some worst-case theories -- such as those based on the VC dimension of the class of predictors induced by modern neural network architectures -- are unable to explain empirical performance. A large volume of work aims to close this gap, primarily by developing bounds on generalization error, optimization error, and excess risk. When evaluated empirically, however, most of these bounds are numerically vacuous. Focusing on generalization bounds, this work addresses the question of how to evaluate such bounds empirically. Jiang et al. (2020) recently described a large-scale empirical study aimed at uncovering potential causal relationships between bounds/measures and generalization. Building on their study, we highlight where their proposed methods can obscure failures and successes of generalization measures in explaining generalization. We argue that generalization measures should instead be evaluated within the framework of distributional robustness.


Reviewer

Neural Information Processing Systems

The authors claim that RSWL is a general framework, yet they only applied it to PCA. Can the parameter k be chosen automatically, perhaps via some cross-validation methods? It would be an interesting future direction to automatically learn hyper-parameter k. How to choose the reduced dimension d? Because if it is fair, I imagine a naive method that assigns all weights (i.e. 1) to the data point with the In other words, the sum of all the weights for the comparative methods is 1 for the fair comparison.


Review for NeurIPS paper: In search of robust measures of generalization

Neural Information Processing Systems

This paper evaluates various "generalization measures" --- numbers computed from training data and training algorithm and network properties --- in terms of their success predicting generalization. The work builds on the prior work of Jiang et al. (their [6]) in ways they clearly define and thus they provide a new set of results on similar questions. Their changes are interesting, and since generalization of deep networks is of such extensive interest to so many, I also feel these results will be valuable. I look forward to seeing this paper appear, and support the authors on future work. Since IMO figure 1 is the main core of this paper, I think it would be reasonable to spend more time explaining figure 1 and even expanding it, in the process shortening some other stuff and moving to appendices?


In search of robust measures of generalization

Neural Information Processing Systems

One of the principal scientific challenges in deep learning is explaining generalization, i.e., why the particular way the community now trains networks to achieve small training error also leads to small error on held-out data from the same population. It is widely appreciated that some worst-case theories -- such as those based on the VC dimension of the class of predictors induced by modern neural network architectures -- are unable to explain empirical performance. A large volume of work aims to close this gap, primarily by developing bounds on generalization error, optimization error, and excess risk. When evaluated empirically, however, most of these bounds are numerically vacuous. Focusing on generalization bounds, this work addresses the question of how to evaluate such bounds empirically.


Building Outlier-Resistant Centroids in Any Dimension

@machinelearnbot

In this article, we also discuss an interesting physics problem: finding the point of maximum or minimum light, sound, radioactivity, or heat intensity, in the presence of an energy field produced by n energy source points. However, the main focus here is on finding the point that minimizes the sum of the "distances" to n points in a d-dimensional space. Both problems are closely related and use the same algorithm to find solutions. The sum of "distances" between an arbitrary point (u, v) and a set S { (x(1), y(1)) ... (x(n), y(n)) } of n points is defined as follows: The function H has one parameter p called power, and when p 2, we are facing the traditional problem of finding the centroid of a cloud of points: in this case, the solution is the classic average of the n points. This solution is notoriously sensitive to outliers.